Prediction of Protein Secondary Structure with two-stage multi-class SVMs

نویسندگان

  • Minh Ngoc Nguyen
  • Jagath C. Rajapakse
چکیده

Bioinformatics techniques to Protein Secondary Structure (PSS) prediction mostly depend on the information available in amino acid sequences. In this paper, we propose a two-stage Multi-class Support Vector Machine (MSVM) approach, where the second MSVM predictor is introduced at the output of the first stage MSVM to capture the contextual relationship among secondary structure elements in order to minimise the generalisation error in the prediction. By using position-specific scoring matrices generated by PSI-BLAST, the two-stage MSVM approach achieves Q3 accuracies of 78.0% and 76.3% on the RS126 dataset of 126 non-homologous globular proteins and the CB396 dataset of 396 non-homologous proteins, respectively, which are better than the scores reported on both datasets to date. By using MSVM, the present prediction scheme significantly achieves 2-6% and 3-15% of improvement in Q3 and Sov accuracies, respectively, on the two datasets. On larger blind-test datasets from PSIPRED, CASP4 and EVA datasets, two-stage MSVM approach achieves Q3 accuracies from 77.0% to 79.5%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-class support vector machines for protein secondary structure prediction.

The solution of binary classification problems using the Support Vector Machine (SVM) method has been well developed. Though multi-class classification is typically solved by combining several binary classifiers, recently, several multi-class methods that consider all classes at once have been proposed. However, these methods require resolving a much larger optimization problem and are applicab...

متن کامل

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

Combining protein secondary structure prediction models with ensemble methods of optimal complexity

Many sophisticated methods are currently available to perform protein secondary structure prediction. Since they are frequently based on di,erent principles, and di,erent knowledge sources, signi>cant bene>ts can be expected from combining them. However, the choice of an appropriate combiner appears to be an issue in its own right. The >rst di@culty to overcome when combining prediction methods...

متن کامل

Secondary structure prediction with support vector machines

MOTIVATION A new method that uses support vector machines (SVMs) to predict protein secondary structure is described and evaluated. The study is designed to develop a reliable prediction method using an alternative technique and to investigate the applicability of SVMs to this type of bioinformatics problem. METHODS Binary SVMs are trained to discriminate between two structural classes. The b...

متن کامل

PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines

Secondary structure prediction is a crucial task for understanding the variety of protein structures and performed biological functions. Prediction of secondary structures for new proteins using their amino acid sequences is of fundamental importance in bioinformatics. We propose a novel technique to predict protein secondary structures based on position-specific scoring matrices (PSSMs) and ph...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • International journal of data mining and bioinformatics

دوره 1 3  شماره 

صفحات  -

تاریخ انتشار 2007